Purpose, Process, Product
We have built everything from net present value and internal rate of return calculators to extreme market event indicators, optimal portfolios, and risk-based capital: all set against the stylized facts of market risk and return. The final project puts it all together for a business context and data set of your choice.
Problem:
Cryptocurrencies first came in to existence in 2009 when bitcoin launched their open source software. Cryptocurrency has multiple advantages. They are encrypted for safe internet transactions and use decentralized control usually using blockchain. With the financial crash of 2008, people were looking to invest in a commodity that was more decentralized away from bank control. For about four years, Bitcoin’s value increased dramatically. Many early purchasers of bitcoin became wealthy from their investiment. Then bitcoin crashed in 2013. People who invested in the trend in 2012 and 2013 ended up losing thousands of dollars. Since then, many more cryptocurrencies have emerged and Bitcoin has recovered in the market place. There are people and businesses that believe cryptocurrency will be way of the future.
For this project, I am consulting for a small business interested in entering to the cryptocurrency market. The would like to know how risky is cryptocurrency and how likely would they be to get a return on their investiment. I will begin to build a comprehensive financial project by comparing three cryptocurrencies; bitcoin, ethereum, and ripple.
Tasks: 1. Examine the volatily of each cryptocurrencies 2. Retrieve and begin to analyze data about potential commodities to diversify into 3. Begin to generate economic scenarios based on events that may, or may not, materialize in the commodities 4. Mitigate their risks by diversifying in Cryptocurrency 5. Identify the optimal combination of Bitcoin, Ethereun, and Ripple to trade.
Key business questions:
More detailed questions:
What are the volatilies of the cryptocurrency?
What are the size and direction of the returns of each cryptocurrency?
Develop a model to optimize the holdings of each of the three commodities.
Run two scenarios: with and without short sales of the commodities.
Interpret results for the company, including tangency portfolio, amount of cash and equivalents in the portfolio allocation, minimum risk portfolio and the risk and return characteristics of each commodity.
Skills:
Understanding how to use the following packages:
Flexdashboard - build dashboards with R
shiny - build interactive web pages with R
QRM - contains models and tools for quantitative risk management
qrmdata - contains functions and datasets for reproducing examples from the book “Quantitative Risk Management: Concepts, Techniques and Tools”
plotly - make interactive graphs, works in conjunction with shiny
psych - contains functions for multivariate analysis, factor analysis, principal component analysis, cluster analysis, and other methods
matrixStats - optimized functions that perform on rows and columns of matrices
moments - functions for calculating Pearson’s kurtosis, Geary’s kurtosis and skewness
zoo - methods for totally ordered indexed observations. Performs calculations containing irregular time series and makes it easy to pass time series objects between zoo and other time series packages. Does not provide modeling functionality. [definition source: https://www.r-bloggers.com/zoo-time-series-exercises]
xts - extensible time series. Enables uniform handling of many R time series classes by extending the zoo package. [definition source: https://www.datacamp.com/community/blog/r-xts-cheat-sheet and also contains a great cheat sheet for how to use this package]
ggplot2 - toolbox for visualization and plotting
quantreg - quantile regression allows us to study the impact of independent variables on different quantiles of dependent variable’s distribution, providing a more complete picture of the relationship between Y and X. This package provides the toolset for performing this type of regression. [definition source: https://rstudio-pubs-static.s3.amazonaws.com/152505_49d1881e3fe64f0bad072282c36a6ca5.html]
mvtnorm allows Multivariate Normal and t Distributions calculations
reshape2 allows easy transformation between wide and long formats (similar to melt, it is in fact based around melt and cast)
scales - aesthetics for graphing
chr [1:928] "Feb 19, 2018" "Feb 18, 2018" "Feb 17, 2018" ...
| mean | median | std_dev | IQR | skewness | kurtosis | |
|---|---|---|---|---|---|---|
| bitcoin | -0.3996 | -0.3306 | 4.1054 | 2.7040 | 0.2797 | 8.3385 |
| ripple | -0.5276 | 0.3445 | 8.1024 | 3.7691 | -3.1195 | 41.5286 |
| ethereum | -0.6227 | 0.0201 | 8.4902 | 6.2411 | 3.5796 | 64.9639 |
Explanation of Process: To get started, in previous R script, I wrangled the cryptocurrency dataset together from a larger dataset acquired from Kaggle. I loaded in the necessary packages (explained in an earlier section). I, then imported the newly wrangled “crypto_data.csv”" using read.csv function. Next, I transformed the data using log and taking the diff or difference of each log. The purpose of transforming the data is to create a normalized datas et. After which, I created a variable called size by taking the absolute values of our transformed data. At this point, I created another variable called direction using an ifelse statement. This will serve as another indicator for skewness. I create two date variables, one in the as.Date mode and the other in character mode. We did this because certain analysis will require our dates to be in different modes in order to run properly. As I continue furturer into the data preprocessing, I made a time series object, which I called data.xts. Then, I used this object to create two new variables, data.zr and returns. Then, as in previous projects I created a table using the moments package. This table shows the descriptives of the newly processed data. Bitcoin has the least skewed distribution, the smallest amount of kurtosis, and the smallest IQR. This means that Bitcoin is the most evenly distrubuted and least variable of the three cryptocurrencies.Bitcoin is the oldest of the three cryptocurrencies. Ethereum is the newest of the cryptocurrencies, and it is also the most variable. The interpretation could be given time the distributions of Ripple and Ethereum could become more even and less variable.
Explanation of Process: To create the historical proice plots, I used the ggplot function to create three simple line graphs. Using a package called cowplot, I was able to plot the three line graphs together. The historical data shows the historical prices of each cryptocurrency from 2015 - 2018. The plots show an overall upwards trend in closing price with a price drop in early 2018. In mid 2017, the closing price seems to steadily grow on all the cryptocurrencies.
Explanation of Process: Using package zoo, I have examined the percentage of change in the cryptocurrencies over time. These currencies show great fluctuatation over time. Even though Bitcoin is the longest existing cryptocurrency, the large changes in market price over time is very similar to that of Ethereum and Ripple. This lends to the notion that these currencies have yet to be stabilized in this market.
Explanation of Process:
In this visualization, I am comparing Bitcoin and Ethereum, two popular cryptocurrencies in order to understand the way market interactions affect returns. To build this model, I used the rolling volatilies of both Bitcoin and Ethereum. I used geom_quantilesto show the data at 95%, 50%, and 5%. With geom_density, I modeled the kernel density of the volatilities in the form of contours. The closer in proximity to one another the contours are, the more densely clustered the volatilities of Bitcoin and Ethereum are to one another. Using a package called magick, I was able to animate the plot to show the yearly volatilities of the two cryptocurrencies.
Explanation of Process Creating a user defined function called corr_rolling, I was able to calculate the rolling correlations of each of the three cryptocurrencies. I then merged the rolling correlations with the tolling volatilities. Then using the seq function to create of a sequence of .05, .95, .05. or multiple taus for quantile regression. The quantile regression will compare the rolling correlation of Bitcoin and ethereum to the volatility of ethereum.
Explanation of Process: For the next three tabs, we be examining the values at Risk for each crytocurrency. First by taking the returns of Bitcoin, I will use this to create a data frame of the returns and the distribution label of historical. Next, I used the function quantile to add quantiles to the plot.Using paste, I plotted the expected shortfall. Then using ggplot, geom_density, geom_vline, and annotate, I created a visualization for Bitcoin’s value at risk. With ggplotly, the plot is inactive. Bicoins expected shortfall is 8.8, and the value at risk is 6.62. The distribution’s highest point is around zero. The right tail of the distribution has a much lower density.
Explanation of Process: Using a similar to the process that created the bitcoin value at risk plot, we can now plot our Ripple value that is at risk. Ripple has a higher value at risk and expected short fall than Bitcoin. Ripple’s distribution as discussed before has much higher kurtosis, giving the distribution a tall and skinny appearence.
Explanation of Process: The last of value at risk plots shows the distribution of the ethereum. Ethereum has the highest expected shortfall and the highest value at risk. The distribution of Ethereum returns left skewed with some extreme right tail outliers.
Explanation of Process: Bitcoin has the lowest value at risk of the three cryptocurrencies. To get a more in depth look at potential loss, I will perform a loss analysis on the Bitcoin data. To begin the loss analysis, I took the most recent prices of Bitcoin, using the function tail. Then, weighted the position of Bitcoin using Bitcoin’s last prices. rowsums, expm1, data.r divided by 100,and multiplied by the calculated weights are used to calculate the loss. median calculates the median of the distribution of the histogram. As in previous section, I use paste to label the value at risk and expected shortfall on the histogram. Then, I plotted the histogram using ggplot and plotly.
Using the calculated loss from the loss analysis section, I can begin to assess for extreme risks. min and max were used to calculate the minium and maxium threholds for the extreme risk plot. Mean exceedances are then calculated. Through the implementation of a for loop to calculate mean excess of threshold and confidence intervals. Then, plot the results with ggplot.
Explanation of Process: GPD or the generalized pareto distribution is needed to estimate historical loss parameters. To calculate the GPD, I set the alpha tolerance to 95%. Next, with the quantiles function to calculate excess with an alpha tolerence of 95%. fit.GPD, then, fit the GPD to the losses. ggplotand ggplotly were utilized to visualized the GPD plot. The GPD value at risk is 12.99 with an eexpected shortfall of 15.64.
Explanation of Process: To find the confidence intervals for the GPD, showRM was used. This produces a complex visualization. Along the x axis the Exceedences are displayed along one y axis the optimization and the other the confidence interval.
R <- returns[, 1:3]/100
quantile_R <- quantile(R[, 1], 0.95) # look at tail of the bitcoin distribution
R <- subset(R, bitcoin > quantile_R,
select = bitcoin:ethereum)
names.R <- colnames(R)
mean.R <- apply(R, 2, mean)
cov.R <- cov(R)
sd.R <- sqrt(diag(cov.R)) ## remember these are in daily percentages
# library(quadprog)
Amat <- cbind(rep(1, 3), mean.R) ## set the equality constraints matrix
mu.P <- seq(0.5 * min(mean.R), 1.5 *
max(mean.R), length = 300) ## set of 300 possible target portfolio returns
# mu.P <- seq(0.5*quantile_R, max(R),
# length = 100) ## set of 300
# possible target portfolio returns
sigma.P <- mu.P ## set up storage for std dev's of portfolio returns
weights <- matrix(0, nrow = 300, ncol = ncol(R)) ## storage for portfolio weights
colnames(weights) <- names.R
for (i in 1:length(mu.P)) {
bvec <- c(1, mu.P[i]) ## constraint vector
result <- solve.QP(Dmat = 2 * cov.R,
dvec = rep(0, 3), Amat = Amat,
bvec = bvec, meq = 2)
sigma.P[i] <- sqrt(result$value)
weights[i, ] <- result$solution
}
sigma.mu.df <- data.frame(sigma.P = sigma.P,
mu.P = mu.P)
mu.free <- 8.22e-05 ## input value of daily risk-free interest rate
## exp(0.03 / 365) - 1 TYX 30 year
## CBOE yield
sharpe <- (mu.P - mu.free)/sigma.P ## compute Sharpe's ratios
ind <- (sharpe == max(sharpe)) ## Find maximum Sharpe's ratio
ind2 <- (sigma.P == min(sigma.P)) ## find the minimum variance portfolio
ind3 <- (mu.P > mu.P[ind2]) ## finally the efficient frontier
col.P <- ifelse(mu.P > mu.P[ind2], "blue",
"grey")
sigma.mu.df$col.P <- col.P Explanation of Process: In this section, we used R code to set up the Markowitz Model. This model is applied to analyze the risk of diversification in the three cryptocurrencies, while allowing the identification and selection of lower risk assets than one single asset alone. In order to set up this model, I used the return variable that I created in data preprocessing. The returns were changed to percentages. Next, I created the Quantile_R variable using the quantile function at 95th quantile. This enabled the visualization of the tail end of the Bitcoin distribution. Next, I used the new Quantile_R vairable to subset the data by examining Bitcoin quantitiles greater than the end of the distribution. This created a new variable called R. Variable R is employed to: 1) create new column names, 2) take the mean of two data points consectutively using the apply function, 3) take the covarience of R, 4) with the covariance of, calculate the standard deviation of R. Next, Amat was calculated using the repeating patterns of mean.r. Amat was used to set the equality constraint of the matrix. Next,the Mu, or performance value was calculated for the model. This was done through the functions min, max, and sequence to set 300 possible target portfolio returns. Then, two variables were set up to be calculated in the future, sigma.P and weights. Now that the placeholder variables were set, a for loop was created to loop through each row of mu.P to calculate both sigma.P (risk value) and weights. Following the for loop, I created a new data frame from the newly calculated sigma value and the mu value called sigma.mu.df. I created a new value called mu.free, which I set to .0000822. mu.free becomes the input value of daily risk-free interest rate. Using mu.P, mu.free and sigma.P, I was able to calculate Sharpe’s ratio, which is the average return that was earned in excess. I used the max function to calculate the maxium value of the Sharpe’s ratio. We used min of sigma to find the minimum variance in the portfolio. Then, I calculated the efficent frontier, which is lowest risk of an expected return. Lastly, I created a value called col.P, which will add colors based on the mu and minium variance value in the visualization on the next tab.
Explanation of Process: Now that I had coded the model, I was able to visualize the results using ggplot2 and plotly. I used the annonate function within ggplot to create a more informative visual with the cryptocurrencies listed on the chart. Using plotly, I created roll over text boxes on the efficent frontier chart, as well.The goal of this visualization to find the optimal profolio. As seen in the plot, the sigma or risk values are on the x and the mu or performace values are on the y axis. The redline shows the mu.free or the input value of daily risk-free interest rate. When portfolio performance is greater than the variation of the portfolio, this is indicated as blue on the graph. When performance is less than our portfolio variation, the line is shown as gray. AS seen in the model, the efficient frontier is between the two points on the graph, which is a very small spread. Bitcoin is the least risky and highest performing of the other two cryptocurrencies. The Markowitz model indicates that Bitcoin portfolio with primarily Bitcoin would be the most favorable investment outcome. Ethereum appeared in the model as high risk and not favorable in returns.
Explanation of Process: To model the portfolio variation, the process was similar to that of the Markowitz model. I set the mu, sigma, weights, and mu.free. This model does not include short sales of the metal commodities. Next, we used ggplot to plot the results, as well, as the annotate function to more clearly label the metal commodities. In this chart, there are large variations among the diffent cryptocurrencies with no short sale present. The effecient frontier is very small and is in favor of Bitcoin.
Explanation of Process:
This chart, again, examined the portfolio variation, this time with short sales included. ggplot and ggplotly were used to visualize the results. The efficient Frontier becomes larger and shows the benefit of diversifying with Bitcoin and Ripple. Ethereum is not favorable to include in the portfolio.
| mean | median | std_dev | IQR | skewness | kurtosis | |
|---|---|---|---|---|---|---|
| actual | 2.7859 | 2.8148 | 0.0792 | 0.0658 | -2.2278 | 9.822 |
| predicted | 3.0467 | 2.9825 | 0.3898 | 0.4836 | 0.9885 | 4.570 |
| residuals | -0.2609 | -0.1903 | 0.4012 | 0.4823 | -1.0002 | 4.584 |
Explanation of Process: In this section, bootstrapping methods were applied in order to get a confidence interval for the Sharpe’s ratio. In this table, the differences between the actual and predicted are displayed across different descriptives. To display this table, I used two new dataframes, one called out_SHORT and the other called out_SUMMARY. out_short is used to create the vairables of actual, predicted, and residuals. Then, data_moments is used to make the summary table. In the table, the predicted kurtosis and skewness of the cyptocurrency distrubtuion are not very accurate. In the following graph, we will model these results between predicted and observed.
Explanation of Process: In this section, we model our confidence interval for the Sharpe’s ratio. To do this, I took the minium and maxium of the actual and predicted values; these values served as the x and y limits for the plot. Then, melt was used to reshape the results. When plotted, the residuals between the actual and predicted have a negative trend.
Explanation of Process: using ggplot, we were able to plot the actual and predicted sharpe’s ratios in order to compare the two. We use geom_density to show the clustering of the Sharpe’s Ratio points.
Conclusions:
Cryptocurrencies are a relatively new commodity in the market place. From the analysis of volatities, cryptocurrencies are relativelt volatile. Bitcoin is the least volatile and has also been in the market place the longest of thee cryptocurrencies. By examining the historical price data of cryptocurrencies, there has been a positive growth in cryptocurrencies. There is risk with the volatility of cryptocurrency in generally. Bitcoin has the lowest risk and highest likilihood for positive returns. Bitcoin is also the most expensive cryptocurrency. Ripple is the lowest cost of the cryptocurrencies; however, it would require the acquisition of far more units of Ripple tobe equivalent to Bitcoin.To mitigate risk, I would recommend that my client not invest in Ethereum. Ethereum is the most volalitile with highest risk and lowest returns. Ethereum is the newest of the three currencies and has not had as much time a Bitcoin to gain stabilization. If my client felt comfortable with the risks associated with investing in cryptocurrency, I would recommend that the company invest primarily in Bitcoin with a small portion of the portofolio to include Ripple. Overall, Bitcoin is the least risky investiment.
References:
Foote, W. (2018). Financial Engineering Analytics: A Practice Manual Using R (Vol. 1.3). https://bookdown.org/wfoote01/faur/
Hayes, A. (2017, November 17). Sharpe Ratio. Retrieved from https://www.investopedia.com/terms/s/sharperatio.asp
Kumar, S. R. Cryptocurrency Historical Prices. Retrieved from https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory *data set used for project is a subset of Kumar’s Kaggle data set
Pollock, D. Five Bitcoin Crashes and What You Can Learn From Them. Retrieved from https://cointelegraph.com/news/five-bitcoin-crashes-and-what-you-can-learn-from-them
Staff, I. (2018, June 18). Efficient Frontier. Retrieved from https://www.investopedia.com/terms/e/efficientfrontier.asp
The Harry Markowitz Model & MPT Assumptions. Retrieved from https://www.ifcm.capital/modern-portfolio-theory/harry-markowitz-model/